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ISSUES CONCERNING THE EVALUATION OF MEDICAL STUDENTS' ABILITIES TO FORMULATE 
- PROBLEM LISTS | | : 
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To help determine the role that the examining’ instrument formats play in. 
| evaluation, two parallel exams were given ‘to: 227 Secanee vente medical students. 
One~ “required students to generate their own problem lists (the generate group) ; 
the other required the students to ccract problem lists from a list of alternd- 
tives: (the select group). “All of these second-year medjéal students had diffi- 
culty formilating problem. lists as indicated by average overall scores of 42% 
and ‘57% correct for the generate and the select groups respectively. Signifi- 
: cant quantitattye' and qualitative differences were noted between the two groups 
in that .they usually ofcked properly integrated problems while the generate 
group sonsepietedt pantiatiy correct answers composed of unintegrated cues. | As j 
predicted, the select group scored significantly higher than the group gener- 
aS their own lists. The relative utility of generate or select response 


formats far diagnostic and certifying examinations is discussed. 
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ISSUES CONCERNING THE EVALUATION OF MEDICAL STUDENTS' ABILITIES TO FORMULATE 
ig "PROBLEM LISTS 


Literature on clinical judgments (evg. Elstein et. al., 1978; Feinstein, 
ie 1967; Weed, 1971) indicates that one of the- most important aspects of patient 
care is naking clinical judgments about the: patient’ s problems. One essential 

aspect of clinical IGEN: is generating diagnostic hypotheses about problems. 

Therefore, an evaluation of students' clinical: judgments Should include*some . 


assessment of their ability to Hevaton adixonoette hypotheses. The evaluation \ 


of clinical judgments is the focus of an exam discussed in this paper. - The 
object of this exam was to evaluate students’ abilities to formulate diagnostic 
hypotheses and patient problems. (Fer the puroden of this'exam both diagnostic 
= hypotheses and patient problems were required on the problem list as explained 
‘in the directions and sample case given to the students. ) Previous efforts to- 
measure students' abilities to formulate problem lists (Berner, 1976, 1977;' 
Helfer and Slater, 1971), indicate oat second-year medical students have 


difficulty with this task. 


€ 


Purposes of the Study : — . ) wg 
_ Student difficulty in formulating problem lists may be x function of any 
of several factors including (a) the student’ s knowledge of the content, (b) - 
the student's clinical Judgment ability; and (c) the nature of the particular 
case. The evaluation of these students can also be influenced by the exami- 
nation format used. This study addresses the influence of examination formats 


and, therefore, attempts to.minimize the effects of the other three ‘factors. 


One way of determining the effect of the examining instrument.on student per- 


‘ formance is through the use of alternative examination formats.- This study 
also considers the utility of, two examination formats as they relate to the 7 


intended purpose of the examination. More specifically, this study adunesee? 


student performance. as a function of whether students generate their. own 


problem lists or select problems from a. long list of. problem-options. 


Students who select froin a list of possible problems nist discriminate 
the correct problems from the incorrect problems. The processes of recogni- | 
tian and elimination can be used to assist these students. | Students who must 
genbrate their own lists cannot use the processes of elimination or recogni - 
tion, but must primarily riety sae pracesee, It is well documented that 
item-selection tests are easier than BOUERITESP RESIS! tests that. require * 
student-generated responses tandatsiny: 19725 itech. 1970; Loftus & Loftus, 
1976; and McCarthy, 1966). However, the decieton to use student generated & 
exams ‘ay AGE be that simple. While the major ciachaion in selecting an | 
, exam format should be the purpose of the exam, feasibility considerations 


often mediate against the best choice. The evaluation }iterature has not 


fully explored the relationships between examination purpose (e. g. admissions 


“decisions, decree counseling; or certification of competence) ‘and the ~ 


exam format selected. Nor does it test for qualitative differences between 


5) 


‘student answers. as a function of examination formats. This study analyzes the 


qualitative and RUSEEDSEING differences between student-selected and student- 


“generated answers in order to (1) determine if these- nati tative differences ) 
do exist, ands (2) aera? the possible relationships. betweeh examination 
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format and intended purpose. ' | A 
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Method’ 


Data. Source : 
Aesecond-year medica’ class (N 4 227) ata state-sypported pnteaslay? was 
_ divided in half on the basis of the first letter of students’ last names (e.g. 
A-K, and L- 1). Means difference testing on other sections Of the exam was 
_ conducted ta avatiate the sampling distribution. These <auiit eOnitcated that 
fe the two groups ‘did not differ significantly on test scores from the other sec- 
; tiohs of this exam with patient cases, which required the student to determine 
f patients.’ diagnoses and problems (Xp K = 61. 893, s.d. = 9.759; Xi. 7 = 62.690, 
s.d. = 9.812; t= 0.615, p’< .54). Thus, these two groups were initially com- 
warable for the purposes of this study. The first group (i.e.,/A-K) was - 
designated the jajurate group; the other group (i.e., L-Z) was designated the 


select group. 


Procedure 


‘ 


The experimenta| examination designed to evaluate the ability of medical 
students to generate or enlace problems- was integrated within a’ comprehensive 
‘examination which evaluated ability to make-clinical judgments. Other sections 

, ia this examination used both generate bnd select formats to evaluate the stu- 
dents’ clinical judgments and knowledge in other areas of medicine, The 
ks “experimental saotion was limited to.the initial problem list so one patient. 


Y Students were given a clinical data base containing history and physical data: 
pear 


he oe a a 


i 
_ for this patient and were ee to determine his proniene or - diagnostic 
\ 
hypotheses. erethys the patient was admitted! to the ndeprtal for reuati of 


bilateral inguinal hernta, with a history of congestive heart failure and 


diabetes mellitus. Half the students generated a list of problems/diaghoses| 


(the generate group) and half of the ‘students (the select group) selected: the. 
I" 


? 


problems/diagnoses from a list of. 67 possibilities. This list was composed of " 


responses from, previous second-year students. Both groups were told to indi- 


cate no more than 20 problems and/or diagnoses which they felt, accurately | 
conveyed the patient's present situation. ° a | 


’ 


Prior to the administration of the exam, physicians developed an "idea 


problem list" containing 13 common problems and diagnostic hypothéses based/on 


data presented for this case. ‘Next, they determined’ the credit: (f.é.6°Ts.2),.23 


or 5 points) to be. given to each area ‘or diagnostic hypothesis. ‘One poi a 
was given for each relevant family history problem reported on the problem 
‘ist. Two points were given for each of fi sevientle hier nn 
ported in the data base (e.g. inguinal pain). Three points, were given for 
each problem that required a little interpretation of data (e.g. Diabetes mel- 
litus--controlled by diet alone. His history of diabetes was given in the 
data base as well as the drugs he took. The fact that it was controlled), had 
to be interpreted from all of the data et io nie points were given for 
each Presten which required iitevoralation and eynthesta of the data (@.g. or- 
ganic heart disease as evidenced by several signs and symptoms given separately 
in the history and physical). The ideal problem list was worth 36 points.” 
The physicians decided that sartte credit could be given for responses which 
identified the relevant ‘signs’ or symptoms but were not condtetaly integrated : 


into problems or diagnostic hypotheses (e.g. diet control, without giving the 


reason). No credit was given for signs and symptoms which were merely repeated 


‘ fe Stated in the data base and which should have been integrated further. 


- 


-Each of the options given to the select group of students had also been 
shemuoviand by these physicians acceeerny to problem and appropriate classi- - 
fication credits. The classification credits were: completely correct (1, 


2 3 or 5 points, as discussed above); partially correct--partial ly uninte- 


grated problems (1, 2, 3 or 4 points); unintegrated (0 points); and inappro- - 
priate problem or diagnosis (0 pofnts).' Inappropriate or over-resolutions 

" were diagnoses that could-not be substantiated from the data base without the 
_ results of laboratory tests not yet a or complications which could 
result from the present condition (é.g. Bust operative pain from the hernia 
Opera Mon y: Pe partially correct problems , any al of the total point | 
credit was allowed to be accymi] ated regardless of how many partial answeres 


“ were given. © 


a 


Once: the examination was given, the physicians categorized the’ student- 
generated responses in the same fashion accarding to problem and appropriate 
classification credits. ‘Identical credit was given to the same response. 

. Any generated responses not on the list. of options were assigned appropriate 
credit according to the same classification scheme. Scores were summed for: 
each student and an item analysis for each problem was conducted for both 

groups. . . | 
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Results | % 


Overall, the select group scored significantly higher (too7 = 9.589, 


- 


p <.05) than the generate Broup on the list of prablens/ St Onoses: - The mean 
scores and standard deviations for the generate group was X = 42%, °s.d. 


11.364; the select group was X = 57% and s.d. = 9.018. The select group did 


* 


‘ = 
not identify all of the patient's problems: even though they did indicate sig- 


nificantly more (too = 2.92, p< .01) individual problems than the generate 
group. ‘Although this difference is statistically significant, it may not be 
meaningful since the select group averaged only one more problem than the gen- 
erate group (generate: X = 8.466, s.d. = 1:447; select: = 9.018, Sd. =, 
1.643). It is interesting to note that a majority of the students in both 


avcoos failed to mention the primary rasson for the patient's hospital admis- 

sion (herniorrhophy) (93% - select; 94% - generate), his inguinal pain (70% - 
} select; 68% - generate), and his nausea (88% - select; 100% - generate), al- 

though all of this information lias cheaply mentioned in the data base, The 


students, selected or wrote an average of 12.823 and 11.826 separate responses. 


rae ‘There clearly is an important difference in the quality of answers between 


the select group and the generate group, as Table ] indicates. ~ The select: 


x group usually picked the completely’ correct angers for siete they believed 
to be relevant, while the\generate group wrote partially integrated cues and 
i completely correct answers for the four problems that required Integration 
of cues. ‘The differences in the quality of answers is especially pronounced © & 
with the aves len requiring the greatest integration of cues into a diagnosis 
. . (ise., arterioscleratic: cardiovascular disease) or. the problem requiring dis-, 
crimination among levels of resolu on (e.g. controlled diabetes mellitus). 
Only 7%.0f the generate group received full credit for the arterfoscleratic 
cardiovascular disease problem, whereas 76% of the other group ‘selected the 
completely correct answer (z = 20.847, p < -001); the generate group wrote 
unintegrated cues noe which they received no credit or partially int@rated 
cues for which they received partial othe 97% of the select areup ‘indicated 
that his diabetes mellitus was controlled by diet alone (i.e., full credit), 
whereas only 43% of the generate group said this (z = 15.503, p < .01); 52% of 
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the generate group indicated that he had diabetes mellitus but failed to say 


whether it was controlled or how it was controlled. Thus, giving students. a 
list of problém-options facilitate their integration of cues into diagnostic 
problems. | fe oo 4 | .¥ 
‘However, students. in the select group also selected diagnoses and prob- 
lems that were not appropriate at that time, either unjustified resolution or 
inappropriate problems : For sibs "anxiety" was an. inappropriate problem 
because, the data base did not discuss the patient’ s ‘mood or emotional ‘state; 


62% of the select group indicated inappropriate problems for this case,.where- 


as’ only 22% of the generate group did this (z = 10.433, p< -05). 


Dipeuss yan 
J 


The majority of students in this study | had difficulty forniulating a prob- 


lem list. The most obvious reason for low pains was the cake of about 


40% of the correct problems from their problem list. There are several pos- 
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sible explanations for. these omissions. Perhaps the students were careless 
and imprecise\ in describing problems. It is also possible that the studants 
may not bave had a good jdea of what a problem and a diagnostic ver ne 
were, even though their curriculum emphasizes Weed's (1971) Problem-Oriented 
Medical Record. The students egiected or wrote responses which did not re- 
flect separate problems. Most ere restated the eee tusblen by indica- 
ting various signs andisymptoms without integrating them into a problem or 
diagnostic hypotheses. Thes@*beginning clinical students also sometimes 
failed to make the appropriate discriminations among levels of integration 


4 ‘4 


and resolution in making clinical judgments. 


The results of this study suggest that giving a list of problems faci]i- 


tates students' achievements in that their overall scores are higher and that 


e 
a 
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, ee ene 7 
for, the most part, the students indicate completely correct problem statements ~ < 


rater than partially correct, unintegrated Stats and symptoms . The improved 
sar foriaice may be’ attributable to cueing ‘as an aid to the recall of relevant 
information. Since it is impossible to determine the extent of the influence 
“of cueing, a scoring correction for the select group was not feasible. ue 
ever, findings are consistent: with earlier work of McCarthy (1966) in the | 
_evaluation of ¢linical performance, which also is indirectly concerned di 
the effects of cueing. McCarthy (1966) compared student performance on an 
oral examination and a printed examination using Jists of alternatives on . 
several aspects of clinical competence. th general, the scores were higher 
,on the printed examination than on the oral exam for the same students. _A , 
second qualitative difference also Seeuitee The students selected more in- 
appropriate: diagnoses when given a fist of sabi arson euns than without such 
a Vist, Beginatng clinical students select some problems ata higher level of 


resolution than is justified, such as diagnoses that cannot yet be made after 


reading only the data base and without the supporting laboratory data. 


™ 


Even ee eat given the additional advantage of cues froma list |. 
of possible problems, cacondeyeah medical students still have difficulty in 
formulating problem lists. These results indicate they haye particular diffi- - 
culty in (1) integrating cues into problems, . (2) selecting the most appropriate va 


levels of problem resolution, and (3) indicating all of the problems for the- 


patient. 


Given this difficulty and students’ limited clinical acporteveks, perhaps 
examina btans gan. be given which facilitate their clinical judgments. Allowing 
students to select problems rather than requiring fhem to generate problems 


\ facilitates performance because of the processes of cueing and elimination. 


F 
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Thus, recognition tests may be appropriate for diagnostic axaininatians for . 


\ 


beginning Sunset students, since this type of item can identify students ' 


weaknesses, ‘as this study has shows and is easier to Spore? 


presenting anaianns with a listyef avenenactves may not be appropriate, 
for all clinical exininatlons: clinical competence is composed of various 
steps. Data gathering (i. e.', Pang place during history and‘ physical) is an 
activity which is cued from what, the patient presents and is not directly téed 
to a avedetdmained list of possibilities. This raises validity questions on ~ 
evaluation instruments of data gathering which allow students to select their: ; 
responses. Diagnostic workups which involve ordering laboratory tests require ° 
‘the integration ef many cues. Physicfans and other heal th: professionals use ie 
the results_of earlier information to-cue the’ ordering of. laboratory tests. 
Since numerous tests are ee and the éase oF performing some tests vary 24 
from Jaboratory to laboratory, physicians may order tests from a ‘standard form : 
listing the alternatives. Thus; allowing students to, select laboratory tests 
from a list, especially if the cost of the es is given, may be an appropriate 
way to test their ability. Forming a diagnosis or ingetootna’ a problem list 
; from given alternatives, on the sae hand, may, not. be appropriate, since these 
“ are “cued from previous information and not a standard “14sts. If all is of 
clinical competency are to be evaluated, the instrument may be composed of a 


combinat on of student-genératé and student- select ‘items nereDanY on the 


~ 


skilis involved. ae aia a. Se ae 
; Lys e aac . é 
Yet, the ‘most frequently used item formats for certification of. tudents 


and licensing of health professionals are multiple choice ques tions and PMP's. 
Both of these formats allow the exaiinees. to select from a list ‘and’ employ . 
~ the processes of recognition, cueing, and ‘elimination. iuttiple choice ques-- 
tone and PMP" $3 therefore, may not be appropriate: for corti fication and 


be 
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licensing, due to the relationship between the intended®purposes of the exam 


~ and the item format. Item formats which allow examinees to employ the pro- 
cesses of recognition and elimination may evaluate performance which 4% lesé-7 e 


than what is required in adtual clinical practice and item formats may inflate 


examinee scores through cueing. Also, these item formats may: not simulate 


reality as a as possible. More ‘open-ended item. formats . requiring gener- " 


ation of answers might be more » appropriate for certain sections of certifying: ' 


and licensing exams since they Females reality better. i 
e 


Vv In-conclusion, the sbomonplatenaes of queations-and test formats should 


‘ “be one of the primary considerations in. designing an examination, rather than 
4 4 


“ease of administratiom and scoring. The level of difficulty of a test is rel- | 


ative. depending on the level of discrimination, required and on the students' 


.abilities. If a test is too difficult, it cannot discriminate those at the 
lower end of the distribution. If it is too easy, it cannot discriminate those 


at the upper end of the distribution. Easier tests may be more appropriate for 


beginning clinical students. Allowing students.to select responses makes the - 
test easier and, therefore, may help to discriminate students’ at. the lower end 
of the distribution. This may be especially helpful for diagnostic examina- 


tions. Thus, the results of this study, toyether with the scoring conventence 


’’ factor seem to indicate that student selected item fernate are appropriate: fer 


evaluating selected types of clinical competence, especially for beginning stu- 


dents: or ‘for diagnostic purposes However, selecting answers may not be 


‘appropriate for all examinations A ee . ; \ 
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| is Percentage Point Credit Analyses by Problem and by Group 
: e "Showing z Values Which Resulted From the Test of the aes — - ; 2 
Difference Between. the Two.Proportions) Ms be ae ee Lg 
Percent Percent. Unintegrated Percent Unintegrated Percent Wrong’ ‘ Percent OVer-resolved Percent ; 
oe Completely Correct ° Partial Corcect No Credit, No Credit «No Credit «=== *Omitted Answer = 
Type and Name of Problem Select Generate _ Select Scenerate -" Seléct Generate Select Generate" Select Generate Select Generate : 
; ‘ ae Group Group Group Group # Group Group f a Group ’ Group ; : Group "Group Group- Group = 
Major Acute Problems ym, eh < i : "ee : Sp tea _ 
Angina Pectoris x) er, 2 00% 03% 00 or a) 00% Os > + OE oy 72 
"2 valve a " 5.623* -2.644% ae _. * 5.5984 
Arterioscleratic, \ : , : . a — 4 ), 
‘Cardiovascular digkase. Y= |,’ bi ; the ‘ 
with various. symptoms ” 76% 07% 52% 100% 70% 80% ° 02% 00% =—.s« 40% 15% 00% oor 
“avavd .  20.847* 14.441" “2.078% 2.148* 6.200* 2 
Inguinal Pain 30%” 31% oo% ox oe 00% | 00% 00x 00% oor 70x BR 
zvalve a 231 * 662 . Betis s "s oe _ 460 cae 
Herniorrhophy —_ 03% > 03% 00% 03% 00% 00% 03% 00% o1Z* 00% 93% on 
z valve = ae . 2.641" . . 2.644 ~ 662 a 30's 
: . ét : . , ‘ o y 
* = significant pe.05 . ; : : * - = ae Y- ‘ “eel 
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Type _and Name of Problem 


"+ Major Chronic Problems 
Diabetes mellitus -., 
controlled 


z valve 


Bilateral inguinal hernias 


z valve 
Minor Chrgnic Problems - 
Bilateral basal rules 
z valve 
Cigarette Smoking , 
z valve 
Nausea 


\ 


z valve 


* = 


= significant p<.05 


Percent 


. 


Completely Correct 


Select 


Group Group 


43te- 
15.503*, 
a 
~1,102 


97% 


94% 


Generaté . 


“Table 1 (Continued) 


* - ‘, : 
Percent Unintegrated -Percent Unintegrated 


Partial Correct 


Select Generate 


Graup Group 

* dae 52% 

" -12.493* : 

: 62% 08% 
* -14.599* 
’ . 

odt 10% 
-3.633* 

00% 00% 

- 00% 00% 


No Credit 


Select Generate 


. Group, Group 
00x sol 
_=yeee *! 
"00% 16% 
-6.561* 
. 00% 00% 
; ad 
00% 00% 
00% 00% 
S 


Percent Wrong 
No Credit 


\ A 
Select Generate 


Group Group 
03%. = = 014 
1522.7. 
001 01x 
-662 
_ 00% 00% 
& 
00% 002 
- 00% 00% 


a 
Percent Over-resolved 


No Credit | - 
Select Generate 


Group — 


Group 
) , 
02% 00%: 
2.148% : 
00% 01% 
~..662 
00% - > 00%. 
00% 00% 
“box . 00% 
oy 


' Select 


oe 


Percent ° 
Omitted Answer 
Generate 


Group Group 


01%" 


16%: 


Minor Chronic Problems 
(Continued) ” 
Optic fifndi with ° 
arteriolar nicking 
z valve 
Pigmented raised 
skin lesion 
z valve 
Family history of Diabetes 
wel tusy heart disease 
_2 valve - 
Past history of TUR of 
prostate 


z valve re 


Wrong problems indicated’ 


* = siqnificant p<.05 


* x 
7 ’ 2 . 
2 
i aaa 3 ; Table 1 (Continued) : . 
Percent : Percent Uninteqrated~ Percent’ Unintegrated 
Completely Correct ‘ Partial Correct No Credit 
Select -Generate Select Generate Reect Generate 
Group Group ‘wt Group Group. Group . Group 
- ek 2 A 
a: . 
75%. 42% 10% 37%. ‘ 00% _ 00% 
7.556* -7.141* 
73% 82% 00% : 10% 04% 02% 
-2.305* -5.011* 1.240 
81% 59% oor = 00% oot 12% 
§.2575* ab.g51e 
96% 79% O1% 15% or o1% 
5.654 -5.678* 662 
62% 22% 
10, 433* 


Percent Wrong 
No Credit 
Select _ Generate 


Group . Group ‘ 


"(12% 


4 


- 


. 


Percent Over-resolved 


——— a 


Percent 


No, Credit Omitted Answer 
Select Generate / * Select © Generate - 
Group “eroup Group . Grdup os 
x 
A -. ie e > 
00x 00% 15%, 208 
-1.402~ . 
00% 00% 09% 06% 
° 1.213 
00% 00% 19% 29% ; 
: 3 
-2.422* 
. v 
00% 01% 03% 04% 
. 662 058° 
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